Introduce PR Benchmark Workflow #903

lapp0 · 2024-05-18T15:39:47Z

Fixes #883

Changes

This change-set configures asv within the repo, along with the asv_benchmark_pr.yml workflow to comment benchmark comparisons in each open PR.

tests/benchmarks/ has been moved to benchmarks/ and converted from pytest-benchmark to asv format.

Behavior

Comparison is between PR HEAD and outlines-dev/outlines@main HEAD.
The use of --interleave-rounds -a repeat=3 in asv continuous mitigates variance due to environmental factors described in Check benchmarks in CI #883, but triples the runtime compared to a single pass.
"The median time from all samples collected in all roudns is used as the final measurement result."
Total benchmark workflow runtime (repeat=3): 23 minutes (should be close to test run time - 10 minutes)
Runs once per commit once the run_benchmarks label is applied
Upload benchmarks as artifact and link via $GITHUB_STEP_SUMMARY

Examples

Times differ between 1% and 4% due to random variation: Workflow for ASV Benchmarks in PR lapp0/outlines#16 (comment)
Demo of Benchmark Output for PR with Performance Regression: Add time.sleep(0.1) to build_regex_from_schema() lapp0/outlines#18 (comment)

Out of Scope

With this infrastructure we can create useful historical performance dashboards such as https://asv-runner.github.io/asv-collection/pandas/ This requires a stable, dedicated machine which must have a guarantee of being idle during benchmark runs.

Repo Configuration Work

Set up branch protection to ensure merging is disabled unless the new workflow is run
Add run_benchmarks label

Removed:

For this workflow we need to set up an access token for the repo with appropriate permissions:

contents: read
- for retrieving compared revisions
pull-requests: read and write
- for commenting

Then create a new asv-benchmarks environment, and a secret with key = GH_TOKEN, value = access token.

Security

I recommend the following setting so arbitrary workflows cannot be run in malicious PRs

https://github.com/outlines-dev/outlines/settings/actions

Text field

peter-evans/create-or-update-comment@*,
peter-evans/find-comment@*,
pre-commit/action@*,

TODO:

asv configuration
PR comment workflow
migrate benchmarks from pytest-benchmark to asv
~~harden workflow security (e.g. a PR with a new workflow using GH_TOKEN could spam the repo using the pull-requests write permissions)~~
use https://github.com/airspeed-velocity/asv/pull/1263/files
update docs
~~[ ] Optimize workflow run time (setup is majority of time, not benchmark execution)~~
- We can't improve performance without abandoning --interleave-rounds
receive commentary

@rlouf / @brandonwillard could you please share your thoughts on features / changes you'd like to see before this is ready for review?

lapp0 · 2024-05-18T15:53:39Z

benchmarks/asv.conf.json

+	"python -m build --wheel -o {build_cache_dir} {build_dir}"
+    ],
+    "environment_type": "virtualenv",
+    "show_commit_url": "https://github.com/lapp0/outlines/commit/",


https://github.com/outlines-dev/outlines/commit/

This change needs to be made.

brandonwillard · 2024-05-23T17:20:51Z

Can we do without the token/permissions if we get rid of the PR comment feature? We really don't need a comment added to a PR if there's a workflow run with the output that we can check.

Also, we need to confirm that this approach will block merging if the benchmarks don't pass. Ideally we won't need to run the benchmarks on every commit, but would still need them to be run in order to merge. In that scenario, it would be up to the maintainers to add a tag that runs the benchmarks.

kc611

Hi @lapp0 🙂

This approach looks really good for outlines's benchmarking requirements. Here are few of my thoughts on this PR.

.github/workflows/asv_benchmark_pr.yml

benchmarks/asv.conf.json

lapp0 · 2024-05-31T22:08:09Z

Thanks for the feedback!

As requested I've

Removed comment functionality, instead results are shown in the BENCHMARK RESULTS section of workflow.
Updated the workflow to ensure it runs only when workflow is manually dispatched OR for every commit when run_benchmarks label is applied.
Updated the asv.conf.json build_command to be consistent with current build command defaults.
Forced workflow failure if performance decline for any test exceeds 10%

Rendered Docs

~~Requires #929 merge first for tests to pass.~~

Example: Terrible performance regression resulting in failure:

PR: Add time.sleep(0.1) to build_regex_from_schema() lapp0/outlines#18
Workflow: https://github.com/lapp0/outlines/actions/runs/9332020809/job/25687346356?pr=18
BENCHMARK RESULTS section in workflow:

Benchmarks that have stayed the same:

Before [`538f77a`]	After [`31b44ca`]	Ratio	Benchmark (Parameter)
4.16±0.01s	4.29±0.02s	1.03	bench_json_schema.JsonSchemaBenchmark.time_json_schema_to_fsm('complex_schema')
2.08±0s	2.13±0.08s	1.03	bench_json_schema.JsonSchemaBenchmark.time_json_schema_to_fsm('simple_schema')
4.89±0.02s	4.92±0.01s	1.01	bench_numba_compile.NumbaCompileBenchmark.time_compile_numba
591M	592M	1	bench_regex_guide.MemoryRegexGuideBenchmark.peakmem_regex_to_guide('complex_span_constrained_relation_extraction')
497M	497M	1	bench_regex_guide.MemoryRegexGuideBenchmark.peakmem_regex_to_guide('simple_phone')
628±3ms	623±4ms	0.99	bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('complex_phone')
5.69±0s	5.69±0.01s	1	bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('complex_span_constrained_relation_extraction')
353±6ms	352±4ms	1	bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('date')
284±0.9ms	283±2ms	1	bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('email')
245±0.7ms	241±3ms	0.99	bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('ip')
179±2ms	181±2ms	1.01	bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('simple_phone')
129±0.9ms	129±0.6ms	1	bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('ssn')
124±0.7ms	123±2ms	0.99	bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('time')
411±2ms	409±0.5ms	1	bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('url')

Benchmarks that have got worse:

Change	Before [`538f77a`]	After [`31b44ca`]	Ratio	Benchmark (Parameter)
+	79.9±2μs	100±0.01ms	1255.5	bench_json_schema.JsonSchemaBenchmark.time_json_schema_to_regex('complex_schema')
+	43.8±0.5μs	100±0ms	2288.37	bench_json_schema.JsonSchemaBenchmark.time_json_schema_to_regex('simple_schema')

Performance degradation detected!

Error: Process completed with exit code 1.

brandonwillard

Just some small comments/updates and we can merge this.

brandonwillard · 2024-06-03T21:38:31Z

benchmarks/asv.conf.json

+	"python -m build --wheel -o {build_cache_dir} {build_dir}"
+    ],
+    "environment_type": "virtualenv",
+    "show_commit_url": "https://github.com/lapp0/outlines/commit/",


This change needs to be made.

brandonwillard · 2024-06-03T21:43:15Z

benchmarks/bench_numba_compile.py

+
+from .common import clear_outlines_cache, setup_tokenizer
+
+outlines.disable_cache()


We need to contain global state changes like this in the set-up/tear-down of these benchmarks, or—better yet—avoid them completely.

lapp0 · 2024-06-04T06:34:47Z

@brandonwillard I've introduced and tested a new outlines.caching.cache_disabled context manager. This decorator is applied to all benchmark tests.

lapp0 commented May 18, 2024

View reviewed changes

brandonwillard requested a review from kc611 May 21, 2024 22:20

brandonwillard assigned brandonwillard and unassigned brandonwillard May 21, 2024

brandonwillard added the tests Linked to library tests label May 21, 2024

kc611 reviewed May 24, 2024

View reviewed changes

.github/workflows/asv_benchmark_pr.yml Show resolved Hide resolved

.github/workflows/asv_benchmark_pr.yml Outdated Show resolved Hide resolved

.github/workflows/asv_benchmark_pr.yml Outdated Show resolved Hide resolved

.github/workflows/asv_benchmark_pr.yml Show resolved Hide resolved

This was referenced May 29, 2024

Added benchmarks for larger regex fsm and runtime benchmarks for the same #925

Open

Historical Benchmark Performance Dashboard #928

Open

lapp0 force-pushed the introduce-asv-ci-workflow-883 branch 17 times, most recently from ded19a6 to 90f30fa Compare May 29, 2024 16:05

brandonwillard reviewed May 29, 2024

View reviewed changes

benchmarks/asv.conf.json Outdated Show resolved Hide resolved

brandonwillard added run-benchmarks run-benchmarks and removed run-benchmarks labels May 29, 2024

lapp0 force-pushed the introduce-asv-ci-workflow-883 branch 2 times, most recently from edd9789 to 22771de Compare May 31, 2024 22:07

lapp0 marked this pull request as ready for review May 31, 2024 22:09

brandonwillard force-pushed the introduce-asv-ci-workflow-883 branch from 22771de to ecc08c5 Compare May 31, 2024 22:59

lapp0 requested review from brandonwillard and kc611 June 1, 2024 18:09

lapp0 mentioned this pull request Jun 1, 2024

Fix null byte \x00 issue in byte level fsm resulting in KeyError in BetterFSM::FSMInfo #930

Merged

lapp0 force-pushed the introduce-asv-ci-workflow-883 branch from b7ab26c to f1e2bbb Compare June 3, 2024 07:57

brandonwillard requested changes Jun 3, 2024

View reviewed changes

lapp0 force-pushed the introduce-asv-ci-workflow-883 branch from f1e2bbb to 309e21f Compare June 4, 2024 06:33

lapp0 added 4 commits June 4, 2024 01:33

ASV PR bench workflow, pytest-bench -> ASV, add peakmem tests

ee85483

ensure workflow fails if benchmark degredation >10%

02d8a84

disable outlines cache localized to the benchmarks scope

d59a239

use outlines-dev/outlines for asv.conf.json show_commit_url

0ea382d

lapp0 force-pushed the introduce-asv-ci-workflow-883 branch from 309e21f to 0ea382d Compare June 4, 2024 06:33

brandonwillard approved these changes Jun 4, 2024

View reviewed changes

brandonwillard merged commit b5a2073 into dottxt-ai:main Jun 4, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce PR Benchmark Workflow #903

Introduce PR Benchmark Workflow #903

lapp0 commented May 18, 2024 •

edited

Loading

lapp0 May 18, 2024

brandonwillard Jun 3, 2024

brandonwillard commented May 23, 2024 •

edited

Loading

kc611 left a comment

lapp0 commented May 31, 2024 •

edited

Loading

brandonwillard left a comment

brandonwillard Jun 3, 2024

brandonwillard Jun 3, 2024

lapp0 commented Jun 4, 2024


		from .common import clear_outlines_cache, setup_tokenizer

		outlines.disable_cache()

Introduce PR Benchmark Workflow #903

Introduce PR Benchmark Workflow #903

Conversation

lapp0 commented May 18, 2024 • edited Loading

Changes

Behavior

Examples

Out of Scope

Repo Configuration Work

Security

TODO:

lapp0 May 18, 2024

Choose a reason for hiding this comment

brandonwillard Jun 3, 2024

Choose a reason for hiding this comment

brandonwillard commented May 23, 2024 • edited Loading

kc611 left a comment

Choose a reason for hiding this comment

lapp0 commented May 31, 2024 • edited Loading

brandonwillard left a comment

Choose a reason for hiding this comment

brandonwillard Jun 3, 2024

Choose a reason for hiding this comment

brandonwillard Jun 3, 2024

Choose a reason for hiding this comment

lapp0 commented Jun 4, 2024

lapp0 commented May 18, 2024 •

edited

Loading

brandonwillard commented May 23, 2024 •

edited

Loading

lapp0 commented May 31, 2024 •

edited

Loading